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Abstract 

Trypanosoma cruzi, the aetiological agent of Chagas disease possess extensive genetic diversity. Tills has led to the 
development of a plethora of nnolecular typing nnethods for the identification of both the known major genetic lineages 
and for more fine scale characterization of different multilocus genotypes within these major lineages. Whole genome 
sequencing applied to large sample sizes Is not currently viable and multilocus enzyme electrophoresis, the previous gold 
standard for T. cruzi typing, Is laborious and time consuming. In the present work, we present an optimized Multilocus 
Sequence Typing (MLST) scheme, based on the combined analysis of two recently proposed MLST approaches. Here, 
thirteen concatenated gene fragments were applied to a panel of T. cruzi reference strains encompassing all known genetic 
lineages. Concatenation of 13 fragments allowed assignment of all strains to the predicted Discrete Typing Units (DTUs), or 
near-clades, with the exception of one strain that was an outlier for TcV, due to apparent loss of heterozygosity in one 
fragment. Monophyly for all DTUs, along with robust bootstrap support, was restored when this fragment was subsequently 
excluded from the analysis. All possible combinations of loci were assessed against predefined criteria with the objective of 
selecting the most appropriate combination of between two and twelve fragments, for an optimized MLST scheme. The 
optimum combination consisted of 7 loci and discriminated between all reference strains In the panel, with the majority 
supported by robust bootstrap values. Additionally, a reduced panel of just 4 gene fragments displayed high bootstrap 
values for DTU assignment and discriminated 21 out of 25 genotypes. We propose that the seven-fragment MLST scheme 
could be used as a gold standard for T. cruzi typing, against which other typing approaches, particularly single locus 
approaches or systematic PCR assays based on ampllcon size, could be compared. 
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Introduction 

Trypanosoma cruzi, the protozoan causative agent of Chagas 
disease, is a monophyletic and genetically heterogeneous taxon, 
with at least six phylogenetic lineages formally recognised as 
Discrete Typing Units (DTUs), TcI-TcVI [1], or near-clades 
(clades that are blurred by infrequent inter-lineage genetic 
recombination, [2]). T. cruzi is considered to have a predomi- 
nantiy clonal population structure but with at least some intra- 
lineage recombination [3,4,5,6]. Tcl and TcII are the most 
genetically distant groups, and the evolutionary origins of TcIII 
and TcIV remain controversial. Based on sequencing of individual 
nuclear genes Westenberger et al. [7] suggested that an ancient 
hybridisation event occurred between Tcl and TcII followed by a 
long period of clonal propagation leading to the extant TcIII and 
TcIV. Alternatively, de Freitas et al. [8] suggested that TcIII and 
TcIV have a separate evolutionary ancestry with mitochondrial 
sequences that are similar to each other but distinct from both Tcl 



and TcII. Recently, Flores-Lopez and Machado [9] proposed that 
TcIII and TcIV have no hybrid origin. Based on the sequence of 
32 genes, they strongly suggested that Tcl, TcIII and TcIV are 
clustered into a major clade that diverged from TcII around 1—2 
millions of years ago. Less controversially, it is clear that TcV and 
TcVI, both overwhelmingly represented in the domestic trans- 
mission cycles in the Southern Cone region of South America, are 
hybrid lineages sharing haplotypes from both TcII and TcIII, with 
both DTUs retaining the mitochondrial genome of TcIII [8, 1 0] . 
Recent phylogenetic studies suggest that the emergence of the 
hybrid lineages TcV and TcVI may have occurred within the last 
60,000 years [11]. Reliable DTU identification and the potential 
for high resolution investigation of genotypes at the intra DTU 
level are of great interest for epidemiological, host association, 
clinical and phylogenetic studies. Historically, a plethora of typing 
techniques have been applied to T. cruzi. Initial pioneering work 
apphed multilocus enzyme electrophoresis (MLEE) techniques 
[12,13,14,15,16,17,18,19,20] revealing the remarkable genetic 
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Author Summary 

The single-celled parasite Trypanosoma cruzi occurs in 
mammals and insect vectors in the Americas. When 
transmitted to humans it causes Chagas disease (American 
trypanosomiasis) a major public health problem. T. cruzi is 
genetically diverse and currently split into six groups, 
known as Tel to TcVI. Multilocus sequence typing (MLST) is 
a method used for studying the population structure and 
diversity of pathogens and involves sequencing DNA of 
several different genes and comparing the sequences 
between isolates. Here, we assess 13 T. cruzi genes and 
select the best combination for diversity studies. Outputs 
reveal that a combination of 7 genes can be used for both 
lineage assignment and high resolution studies of genetic 
diversity, and a reduced combination of four loci for 
lineage assignment. Application of MLST for assigning field 
isolates of T. cruzi to genetic groups and for detailed 
investigation of diversity provides a valuable approach to 
understanding the taxonomy, population structure, ge- 
netics, ecology and epidemiology of this important human 
pathogen. 

heterogeneity of this parasite. Subsequently, several PCR-based 
typing assays have been designed to differentiate the main DTUs 
[21,22,23,24] and more recently, combinations of PCR-RFLP 
schemes have been published [25,26,27]. Some approaches based 
on DTU characterisation by direct sequential PGR amplifications 
from blood and tissue samples are also promising, although 
various sensitivity and reliability issues need to be resolved 
[28,29,30]. Microsatellite typing (MLMT) has also been appHed 
to population data for fine-scale intra DTU genetic analysis 
[31,32,33]. 

Multilocus sequence typing (MLST), originally developed for 
bacterial species typing, has now been applied to a wide range of 
prokaryotic [34,35,36,37] and increasingly eukaryotic microor- 
ganisms [38,39,40,41,42,43,44,45,46,47,48]. The technique typi- 
cally involves the sequencing and concatenation of six to ten 
internal fragments of single copy housekeeping genes per strain 
[49]. Data are often hosted on interactive open access databases 
such as MLST.net for use in the wider research community. A 
major advantage of MLST analysis is that sequence data are 
unambiguous, minimizing interpretative errors. In this context, the 
MLST approach represents an excellent candidate to become the 
gold standard for T. cruzi genetic typing with outputs suitable for 
phylogenetic and epidemiological studies, particularly where large 
numbers of isolates from varied sources are under study. 

Recently, two multilocus sequence typing (MLST) schemes have 
been developed in parallel for T. cruzi, each of them based on 
different gene targets [50,51]. Both schemes display a high 
discriminatory power and are able to clearly differentiate the main 
T. cruzi DTUs. The current work proposes to resolve the 
optimum combination of loci across the two schemes to produce 
a reproducible and robust formalised MLST scheme validated 
across a shared reference panel of isolates for practical use by the 
wider T. cruzi research community. 

Methods 

Parasite strains and DNA isolation 

Twenty five cloned reference strains belonging to the six known 
DTUs were examined (Table 1). These strains have been widely 
used as reference strains in many previous studies, and are 
regularly examined in our laboratory by Multilocus Enzyme 



Electrophoresis (MLEE). Parasite stocks were cultivated at 28°C in 
liver infusion tryptose (LIT) supplemented with 1% hemin, 10% 
fetal bovine serum, 100 units/ml of penicillin, and 100 |a.g/mL of 
streptomycin or in supplemented RPMI liquid medium. 

MLST loci 

Initially a total of 19 gene fragments were considered, 10 
housekeeping genes previously described by Lauthier et al. [50] 
[Glutathione peroxidase (GPX), 3-Hidroxi-3-metilglutaril-GoA 
reductase (HMCOAR), Piruvate dehydrogenase component El 
subunit alfa {PDH), SmaU GTP-binding protein Rab7 {GTP), 
Serine/treonine-protein phosphatase PPl (STPP2), Rho-like GTP 
binding protein (RHOl), Glucose-6-phosphate isomerase {GPI), 
Superoxide dismutase A {SODA), Superoxide dismutase B [SODB) 
and Leucine aminopeptidase {LAP)]; and 9 gene fragments from 
Yeo et al. [51] [ascorbate-dependent haemoperoxidase (TcAPX), 
dihydrofolate reductase-thymidylate synthase (DHFR-TS), gluta- 
thione-dependent peroxidase II (TcOPXII), mitochondrial perox- 
idase {TcMPX), trypanothione reductase (77?), RNA-binding 
protein-19 (RB19), metacyclin-II {Met-II), metacyclin-III {Met- 
III) and LYTI]. However, 6 of them were discarded based on 
initial findings [50,51]. Although some of the excluded targets 
were informative, they were not amenable for routine use. More 
specifically, LYTI was discarded due to unreliable PGR amplifi- 
cation and sequencing despite multiple attempts at optimization; 
TR, DHFR-TS and TcAPX were also deemed unsuitable as 
internal sequencing primers were required; finally, Mel-III and 
TcGPXII were also excluded because generated non-specific PGR 
products with some isolates. 

The final 13 gene fragments assessed included 3 fragments 
described by Yeo et al. [51] and the 10 housekeeping genes 
previously described by Lauthier et al. [50] . These were: TcMPX, 
RB19, Met-II, SODA, SODB, LAP, GPI, GPX, PDH, 
HMCOAR, RHOl, GTP and STPP2. For the 13 loci under 
study, searches in the GL-Brener and Sylvio XI 0 genomes (http:// 
tritrypdb.org/tritrypdb/), using the primer sequences as well as 
the fragment sequences as query, displayed single matches in all 
cases. Ghromosome location, primer sequences and amplicon size 
for each target are shown in Table 2. Nucleotide sequences for all 
the analysed MLST targets are available from GenBank under the 
following accession numbers: JN129501-JN129502, JN129511- 
JN129518, JN129523-JN129524, JN129534-JN129535, JN12954 
4-JN129551, JN129556-JN129557, JN129567-JN129568, JN129 
5 7 7-JN 1 29584, JN 1 29589-JN 1 29590, JN 1 29600-JN 1 2960 1 , JN 1 2 
9610-JN129617,JN129622-JN129623,JN129633-JN129634,JN1 
29643- JN 129650, JN129655-JN 129656, JN129666-JN 129667, 
JN129676-JN129683, JN12968B-JN 129689, JN129699-JN1297 
00, JN129709-JN129716, JN129721-JN129722, JN129732-JN12 
9733,JN129742-JN129749,JN129754-JN129755,JN129765-JN1 
29766, JN129775-JN129782,JN129787-JN129788,JN129798-JN 
129799, JN129808-JN129815, JN129820-JN 129821, KF889442- 
KF889646. Additionaly, we used T. cruzi marinkellei as outgroup. 
Sequence data of the selected targets for T. cruzi marinkellei were 
obtained from TriTrypDB (http:/ /tritrypdb.org), under the 
following accession Ids: TcMARK_GONTIG_2686, TcMARK_- 
GONTIG_670, TcMARK_GONTIG_1404, Tc_MARK_206B, 
Tc_MARK_3409, Tc_MARK_5695, Tc_MARK_9874, Tc_MA 
RK_515, Tc_MARK_4984, Tc_MARK_5926, Tc_MARK_ 
8923, TcMARK_GONTIG_1818 and Tc_MARK_2666. 

Molecular methods 

PGRs were performed in 50 |a,l reaction volumes containing 
100 ng of DNA, 0.2 |iM of each primer, 1 U of goTaq DNA 
polymerase (Promega), 10 |xl of buffer (supplied with the GoTaq 
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Table 1. Cohort of clonal reference isolates representing the six known T. cruzi lineages (DTUs). 



strain 


DTU 


Origin 


Host 


1. XIOcll 


Tel 


Belem, Brazil 


Homo sapiens 


2. Cutia c1 


Tel 


Espiritu Santo, Brazil 


Dasyprocta aguti 


3. Sp104 cll 


Tel 


Region IV, Chile 


Triatoma spinolai 


4. P209 cl93 


Tel 


Sucre, Bolivia 


Homo sapiens 


5. OPS21 clll 


Tel 


Cojedes, Venezuela 


Homo sapiens 


6. 92101601P cl1 


Tel 


Georgia, USA 


Didelphis marsupiatis 


7. TU18 cl93 


Tell 


Potosi, Bolivia 


Triatoma infestans 


8. CBB cl3 


Tell 


Region IV, Chile 


Homo sapiens 


9. Mas ch 


Tell 


Federal District, Brazil 


Homo sapiens 


10. IVV cl4 


Tell 


Region IV, Chile 


Homo sapiens 


11. Esm cl3 


Tell 


Sao Felipe, Brazil 


Homo sapiens 


12. M5e31 cl5 


Tell! 


Selva Terra, Brazil 


Dasypus novemcinctus 


13. M6241 cl6 


Tell! 


Belem, Brazil 


Homo sapiens 


14. CM17 


Tell! 


Meta, Colombia 


Dasypus sp. 


15. XI 09/2 


Tell! 


Makthlawaiya, Paraguay 


Canis familiaris 


16. 92122102R 


TcIV 


Georgia, USA 


Procyon lotor 


17. Canlll cll 


TcIV 


Belem, Brazil 


Homo sapiens 


18. Dog Theis 


TcIV 


USA 


Canis familiaris 


19. Mn cl2 


TeV 


Region IV, Chile 


Homo Sapiens 


20. Bug 2148 cll 


TeV 


Rio Grande do sul, Brazil 


Triatoma infestans 


21. S03 cl5 


TeV 


Potosi, Bolivia 


Triatoma infestans 


22. SC43 cll 


TeV 


Santa-Cruz, Bolivia 


Triatoma infestans 


23. CL Brener 


TeV! 


Rio Grande do Sul, Brazil 


Triatoma infestans 


24. P63 cll 


TeV! 


Makthlawaiya, Paraguay 


Triatoma infestans 


25. Tula cl2 


TeV! 


Talahuen, Chile 


Homo sapiens 



doi:10.1371/journal.pntd.0003117.t001 



polymerase) and a 50 |J.M concentration of each deoxynucleoside 
triphosphate (Promega). Amplification conditions for all targets 
were: 5 min at 94°C followed by 35 cycles of 94°C for 1 min; 
55°C 1 min, and 72°C for 1 min, with a fmal extension at 72°C 
for 5 min. Amplified fragments were purified (QIAquick, Qiagen) 
and sequenced in both directions (ABI PRISM 310 Genetic 
Analyzer or ABI PRISM 377 DNA Sequencers, Applied 
Biosystems) using standard protocols. Primers used for sequencing 
were identical to those used in PGR amplifications. In order to 
assess reproducibility, each PGR amplification was performed 
multiple times and associated sequencing was repeated at least 
twice. 

Data analysis 

MLST data were analysed with MLSTest software (http:/ /ipe. 
unsa.edu.ar/software) [52] with the objective of identifying the 
most resolutive and minimum number of targets for unequivocal 
DTU assignment and potential fine scale characterisation. 
MLSTest contains a suite of MLST data specific analytical tools. 
Briefly, single nucleotide polymorphisms (SNPs) were identified in 
all loci in MLSTest alignment viewer. Typing efficiency (TE) was 
calculated using the same software. TE for a determined locus is 
calculated as the number of identified genotypes divided by the 
number of polymorphic sites in this locus. Additionally, discrim- 
inatoiy power, defined as the probability that two strains are 
distinguished when chosen at random from a population of 
unrelated strains [53], was determined for each target (Table 3). 



Sequence data were concatenated and Neighbour Joining 
phylogenetic trees were generated by using uncorrected p- 
distances. Heterozygous sites were handled in the analyses using 
two different methods. First, a SNP duplication method described 
by Yeo et al. and Tavanti et al. [51,54] was implemented. Briefly, 
the SNP duplication method involves the elimination of mono- 
morphic sites and duplication of polymorphisms in order to 
"resolve" the heterozygous sites. As an example, a homozygous 
variable locus scored as G (cytosine) wiU be modified by GG; while 
a heterozygous locus, for example Y (G or T, in accordance with 
lUPAG nomenclature), will be scored as GT. Alternatively, 
heterozygous SNPs were considered as average states. In more 
detail, the genetic distance between T and Y (heterozygosity 
composed of T and G) is considered as the mean distance between 
the T and the possible resolutions of Y (distance T-T = 0 and 
distance T-G = 1, average distance = 0.5, see [53] and MLSTest 
1.0 manual at http://www.ipe.unsa.edu.ar/software for further 
details). Statistical support was evaluated by 1000 bootstrap 
replications. Overall phylogenetic incongruence among loci (by 
comparison with the concatenated topology) was assessed by the 
Incongruence Length DifiFerence Test using the BIO-Neighbour 
Joining method (BIONJ-ILD, [55]) and evaluated by a permuta- 
tion test with 1,000 replications. Briefly, the ILD evaluates 
whether the observed incongruence among fragments is higher 
than that expected by random unstructured homoplasy across the 
diflFerent fragments. A statistical significant ILD p value indicates 
that many sites, in at least one fragment, support a phylogeny that 
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Table 2. Details of gene targets. 





Gene 


Gene \D' 


Chromosome 
Number 


Primer Sequence (5'-3') 


Amplicon 
size (bp)" 


Sequence 
start 5'"= 


Fragment 
Length (bp)" 




TcOO. 1 047053506529.508 


6 


CGCCATGTTGTGAATATTGG (20) 


405 


21 


365 


GGCGGACCACAATGAGTATC (20) 


HMCOAR'f 


TCOO.l 047053506831. 40 


32 


AGGAGGCTTTTGAGTCCACA (20) 


554 


21 


514 


TCCAACAACACCAACCTCAA (20) 




TcOO.1047053506649.40 


8 


AGTTGCTGCTTCCCATCAAT (20) 


455 


21 


415 


CTGCACAGTGTATGCCTGCT (20) 


Tc MPX*f 


TcOO.1047053509499.14 


22 


ATGTTTCGTCGTATGGCC (18) 


678 


109 


505 


TGCGI 1 1 1 ICTCAAAATATTC (21) 


LAP* 


TcOO.1047053508799.240 


27 


TGTACATGTTGCTTGGCTGAG (21) 


444 


22 


402 


GCTGAGGTGATTAGCGACAAA (21) 


SODB" 


TcOO.1047053507039.10 


35 


GCCCCATCTTCAACCTT (17) 


313 


18 


266 


TAGTACGCATGCTCCCATA (19) 


RBI 9* 


TcOO.1047053507515.60 


29 


GCCTACACCGAGGAGTACCA (20) 


408 


49 


340 


TTCTCCAATCCCCAGACTTG (20) 


GPX 


TcOO.104705351 1543.60 


35 


CGTGGCACTCTCCAATTACA (20) 


360 


21 


321 


AATTTAACCAGCGGGATGC (19) 


PDH 


TcOO.1047053507831.70 


40 


GGGGCAAGTGTTTGAAGCTA (20) 


491 


21 


451 


AGAGCTCGCTTCGAGGTGTA (20) 


CTP 


TcOO.1047053503689.10 


12 


TGTGACGGGACATTTTACGA (20) 


561 


21 


521 


CCCCTCGATCTCACGATTTA (20) 


SODA 


TcOO.1047053509775.40 


21 


CCACAAGGCGTATGTGGAC (19) 


300 


20 


263 


ACGCACAGCCACGTCCAA (18) 


STPP2 


TcOO.1047053507673.10 


34 


CCGTGAAGCTTTTCAAGGAG (20) 


409 


21 


369 


GCCCCACTGTTCGTAAACTC (20) 


Met-ll 


TCOO.104705351 0889.280 


6 


TCATCTGCACCGATGAGTTC (20) 


700 


51 


389 


CTCCATAGCGTTGACGAACA (20) 



*Gene fragments included in the 7 loci MLST scheme; 

^Gene fragments included in the reduced 4 loci MLST scheme; 

^Gene ID: GenBank access number for the complete gene in the CL-Brener strain; 

"^Amplicon size refers to the sequence size of the gene fragment including the primers regions; 

'^S' starting position: indicates the position where the analyzed sequence starts, counting from the first base of the amplicon; 
^Fragment Length refers to the sequence length used for the analyses (the analyzed fragments do not include the primer regions). 
doi:1 0.1 371/journal.pntd.00031 1 7.t002 



is contradicted by otlier fragments. In order to localize significant 
incongruent branches in concatenated data we used the Neigh- 
bour Joining based Localized Incongruence Length Difference 
(NJ-LILD) test available in MLSTest. NJ-LILD is a variant of the 
ILD test that allows localizing incongruence at branch level. 

All combinations from 2 to 12 fragments were analysed using 
the scheme optimisation algorithm in MLSTest which identifies 
the combination of loci producing the maximum number of 
diploid sequence types (DSTs). Three main sequential criteria 
were applied to select the optimum combination of loci: firstly, 
monophyly of DTUs and lineage assignment; secondly, robust 
bootstrap values for the six major DTUs (1000 replications); and 
thirdly detection of genetic diversity at the intra-DTU level. 

Results 

PCR amplification and sequencing 

All 1 3 gene fragments were successfully amplified using identical 
PCR reaction conditions (see methods) which generated discrete 
PCR fragments. PCR amplifications of the 13 targets were applied 



to an extended panel of 90 isolates obtaining more than 98% of 
positive PCR and amplifications produced strong amplicons and 
an absence of non-specific products (data not shown). We obtained 
amplicons of the expected length for all the assayed targets and for 
all the examined strains. Amplification for various DNA template 
concentrations was assayed via serial dilution. No difference in 
PCR amplifications were obtained when DNA concentrations 
from 20 to 1 00 ng were used. A total of 5, 1 2 1 bp of sequence data 
were analysed for each strain (Table 2). There were no gaps in 
sequences. The number of polymorphic sites (Table 3) for each of 
the different fragments varied from 8 {STPP2) to 40 [Met-II). 
STTP2 showed the lowest discriminatory power (describing just 5 
different genotypes in the dataset). Rhl9 was the fragment with the 
highest discriminatory power identifying 21 distinct genotypes in 
the dataset. 

Optimized scheme for MLST 

Initially, Neighbor Joining trees were generated from concat- 
enated sequences across the 13 prescreened loci which identified 
four monophyletic DTUs with robust bootstrap support (Tcl, 
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Table 3. T. cruzi MLST targets. 





Gene fragment 


No. of genotypes 


No. of polymorphic sites 


Typing efficiency^ 


DP 


GPI*^ 


9 


18 


0.500 


0.889 


HMCOAR'f 


15 


20 


0.750 


0.954 


RHOT' 


13 


23 


0.565 


0.914 


Tc MPX*f 


11 


12 


0.917 


0.905 


LAP* 


13 


16 


0.812 


0.942 


SODB* 


12 


9 


1.333 


0.914 


RBI 9* 


21 


26 


0.808 


0.985 


GPX 


12 


16 


0.750 


0.908 


PDH 


11 


15 


0.733 


0.920 


GTP 


10 


18 


0.556 


0.905 


SODA 


10 


10 


1.000 


0.880 


STPP2 


5 


8 


0.625 


0.585 


Met-ll 


19 


40 


0.475 


0.978 



DP: Discriminatory Power according to [53], 
^Number of genotypes per polymorphic site, 
*lncluded in the seven loci scheme, 
^Included in the four loci scheme. 
doi:l 0.1 371/journal.pntd.00031 1 7.t003 



TcII, TcIII, TcIV, bootstrap >98%). TcVI was also monophyletic 
but with a relatively low support (Figure 1). Additionally, TcV was 
paraphyletic with Mncl2 as an outlier. The concatenated 13 
fragments difTerentiated all 25 reference strains in terms of DSTs. 
We observed that bootstrap values were slightly different between 
the two methods (SNP dupKcation and average states) as they 
manage heterozygous sites differently. Values were higher for the 
SNP duplication method in most branches (Figure 1, branch 
values highlighted in blue) as a consequence of base duplication, 
which modifies the alignment and increases the informative sites 
used for bootstrapping. To avoid the potential for methodologi- 
cally elevated bootstraps, the average states method was imple- 
mented for further analyses. From the selected 13 loci, all possible 
combinations of 2 to 12 loci were analysed (8,177 combinations) 
by implementing the MLSTest scheme optimisation algorithm. 
One combination of 7 loci was the best according to the proposed 
criteria. This combination consisted of Rbl9, TcMPX, 
HMCOAR, RHOl, GPI, SODB and LAP discriminating aU 25 
strains as DSTs, and importantly categorising all separate DTUs 
as a monophyletic group. DTUs were also well-supported by 
associated bootstraps values (Tcl,100; TcII, 100; TcIII, 99.8; 
TcIV, 88.2; TcV, 88.7; TcVI, 99.6) as illustrated in Figure 2. 
Combinations with higher number of loci (from 8 to 12) did not 
significantiy increased bootstrap values of TcIV and TcV. 

We assessed whether the outher for TcV (Mn cl2) and the low 
bootstrap observed for TcVI (applied to all 13 fragments) was due 
to incongruence among fragments. The thirteen fragment dataset 
was significandy incongruent (BIONJ-ILD p-value<0.001) for at 
least one partition which was corroborated using NJ-LILD with a 
permutation test and 500 replications. Significant incongruence (p- 
value<0.05 after Bonferroni correction) was detected in the TcV 
and TcVI nodes. Incongruence was likely due to strains within 
DTUs TcV and TcVI demonstrating apparent loss of heterozy- 
gosis (LOH) in the Met-II fragment. Excluding Met-II, the p-value 
for ILD was not significant (BIONJ-ILD p-value = 0.33), and the 
bootstrap values for TcV and TcVI exceeded 85%, furthermore 
tree topology was congruent with expected DTU assignment. 



Reduced scheme for DTU assignment 

Attempts were made to reduce the number of fragments 
required for DTU assignment while maintaining DST identifica- 
tion. AU combinations of 3 and 4 fragments (1,001 combinations) 
from the panel of 1 3 fragments were analysed as described above. 
A reduced MLST panel incorporating TcMPX, HMCOAR, 
RHOl and GPI (four loci) produced the highest bootstrap values 
for DTU assignment across the DTUs, Tcl (99.9), TcII (100), 
TcIII (99.5), TcIV (86.7), TcV (100) and TcVI (96.8) (Figure 3), 
and discriminated 19 of 25 DSTs. Other combinations showed 
higher discriminatory power but presented with lower bootstrap 
values (data not shown). The TcMPX locus exhibits an apparent 
loss of heterozygosity (LOH) in the hybrid DTU TcV, retaining 
the TcII like allele but not the TcIII allele. Therefore DTU 
assignment using TcMPX alone would not assign a TcV isolate 
correctiy. However concatenation of TcMPX with HMCOAR, 
RHOl and GPI allow distinguishing TcV from TcII. 

Inter and intra DTU phylogenies 

Topologies obtained for the 7 and 4 loci combinations 
(Figures 2 and 3, respectively) were similar to the 13 loci scheme, 
showing consistendy the two major groups (TcTTcIII-TcIV and 
TcITTcV-TcVT) supported by high bootstrap values, even when 
trees were rooted using TcMB7 (Figure 1). The primary difference 
between the 13 target concatenated phylogenies and the trees 
obtained for the 7 and 4 targets was that for the 1 3 concatenated 
targets TcV was paraphyletic, showing the Mncl2 strain as an 
outiier. Regarding inter-DTU relationships, the analysis of the 
concatenated 1 3 fragments divided DTUs into two major clusters: 
one composed by Tcl, TcIII and TcIV, with a bootstrap value of 
100%; while the remaining group containing TcII, TcV and TcVI 
was supported by lower bootstrap values (<70%), possibly due to 
presence of the two hybrid DTUs (TcV and TcVI) (Figure 1). 
Within clusters, internal topologies were supported with relatively 
high but variable bootstrap values with 4, 7 and 13 loci 
combinations and generally consistent intralineage topologies 
(Figures 1, Figure 2, Figure 3), although the panel of 25 reference 
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Figure 1 . Neighbor Joining tree based on the concatenation of 1 3 iVILST fragments. Different DTUs are represented by vertical bars. Branch 
values represent bootstrap values (1 000 replications), different bootstrap values indicate the method of handling heterozygous sites: SNP duplication 
method (first value) and average states (second values). Branch supports highlighted in blue shows branches where support for SNP duplication 
method was higher than the average states method. The outlier TcV is highlighted in red. Scale bar at the bottom left represents uncorrected p- 
distances. 

doi:10.1371/journal.pntd.0003117.g001 



strains would need to be expanded further for assessment of fine 
scale intralineage associations. 

Discussion 

Thirteen gene fragments were assessed in an optimised MLST 
scheme which is a combination of targets from two recently 
separately proposed schemes [50,51]. Here we evaluated the 
optimal combination of loci based on three main sequential 
criteria: first, assignment to the expected DTU; second, to attain 
robust bootstrap values for the six major DTUs, and third to 
detect intra-DTU diversity. For the first time we propose an 
optimised MLST scheme, validated against a panel representing 
all known lineages, for characterisation of T. cruzi isolates. 
However, it should be emphasized that this MLST scheme is 
proposed as a typing method for T. cruzi isolates but not as a 
typing method to be used directly on biological samples as blood, 
tissues or Triatomine feces, for which more sensitive and simpler 
methods are needed. Moreover, we have performed assays with 
the purpose of determining the limit of detection of each gene 
fragment on blood and triatomines feces (data not shown) and we 
found that none of these targets are suitable for detecting T. cruzi 
in the normal concentration found in natural biological samples. 

As a result of our data analyses, we obtained one combination of 
7 loci and one combination of only 4 targets which most closely 



adhered to the selection criteria described above. It is worth noting 
that the three used criteria for selecting optimum combination of 
targets are sequential; it means that there is a hierarchical order of 
these criteria. In first place, we look for obtaining monophyly for 
the six DTUs and accurate lineage assignment of each examined 
strain. In a second place, we look for obtaining robust bootstrap 
values for each of the six major DTUs. Finally, we expect detecting 
genetic diversity at the intra-DTU level. In this context, due to the 
hierarchical order of the criteria of selection of loci, the selected 
combinations wiU optimise the number of DSTs but subordinated 
to the two previous criteria. Theoretically, using these criteria, we 
could obtain a combination of loci that does not give the 
maximum number of DST for a determined DTU, because our 
algorithm previously prioritized obtaining monophyly and strong 
bootstrap values for the six DTUs. This was the case for the 
selected 4-loci scheme (which differentiated 19 from 25 strains). In 
spite of this, the selected 7-loci combination that we propose, allow 
us to diHerentiate the 25 examined strains, i.e. the maximum 
possible number of DSTs. The results illustrate that MLST is a 
highly discriminatory strain-typing technique. From these data we 
suggest that the 7 locus scheme provides scope for both lineage 
assignment and diversity studies, generating robust bootstrap 
values for distance based phylogenies and that a reduced panel of 
only four targets is sufficient for assignment to DTU level. For 
population genetics scale analyses and detailed epidemiological 
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Figure 2. Neighbor Joining tree based on the concatenation of 
7 selected MLST fragments: Rb79, TcMPX, HMCOAR, RH01, GPI, 
SODB and LAP. Different DTUs are represented by vertical bars. Branch 
values represent bootstrap values (1000 replications). Heterozygous 
sites were considered as average states (see methods). Scale bar at the 
bottom left represents uncorrected p-distances. 
doi:1 0.1 371 /journal.pntd.00031 1 7.g002 

studies a comprehensive larger panel of T. cruzi isolates should be 
assessed by sequencing the proposed targets. 

The phylogenetic associations among DTUs Tcl, TcII, TcIII 
and Tc IV are debatable. Split affinities and incongruence have 
been observed in nuclear phylogenies [7,8,51,56]. One interpre- 
tation of phylogenetic incongruence is genetic recombination, 
although due to the highly plastic nature of the T. cruzi genome 
other causes are also possible. Mutation rates and gene conversion 
may create distinct levels of sequence diversity [57]. Here, 
concatenated phylogenies showed a partition into two main 
clusters for all gene combinations tested, the first consisting of Tcl, 
TcIII and TcIV (bootstrap value =100%); and the second 
composed of TcII, TcV and TcVI (bootstrap value <70%). The 
presence of the two known hybrid lineages (TcV and TcVI) 
generated artifactual phylogenetic structuring and excluding these 
representatives revealed clustering of DTUs Tcl, TcIII and TcIV, 
indicating that Tcl has a closer affinity to TcIII than to TcIV. TcII 
is the most genetically distant group which is in agreement with 
previous findings [9,10,51]. In addition, it would be interesting to 
analyze in the future the new lineage described as TcBat [58] 
using the MLST scheme proposed here, since it could shed light 
on the phylogenetical position of this interesting lineage. 

LOH observed in Met-II and TcMPX gene fragments affecting 
the hybrid lineages TcV and TcVT has potentially significant 
consequences for MLST and lineage assignment [51]. Isolates 
affected retain the TcII like allele and would be misassigned in 
single locus characterisation. For example, hybrid isolates TcV 
would be assigned to TcII based on TcMPX sequencing due to 
apparent LOH. Despite this LOH the TcMPX locus was included 
in the 4 target scheme to increase bootstrap support in 
differentiating between TcV from TcVI. 

Although MLST has been successfijlly applied to other diploid 
organisms including Candida albicans, the potential for heterozygous 



Figure 3. Neighbor Joining tree based on the concatenation of 
4 selected MLST fragments (TcMPX, HMCOAR, RH01, GPI) for 
DTU assignment. Different DTUs are represented by vertical bars. 
Branch values represent bootstrap values (1000 replications). Hetero- 
zygous sites were handled using the average states method. Scale bar 
at the bottom left represents uncorrected p-distances. 
doi:10.1371/journal.pntd.0003117.g003 

alleles complicates typing schemes. In the present work, two methods 
to handle heterozygous sites, SNPs duplication and average states 
algorithms, produced broadly similar results with SNP duplication 
producing marginally higher bootstraps due to the physical 
duplication of informative sites. Here we decided to implement the 
average states methodology to derive genetic distances and phylog- 
enies. Both approaches can be found in the software MLSTest [52] 
producing results that enable resolution at the DTU level and an 
associated DP of 1 for the panel tested. A significant advantage of 
MLST based analysis over sequential PGR based gels is that once 
generated, sequences can be applied to a range of complementary 
downstream analyses. For example, the resolution of haplotypes for 
recombination analysis and investigation of more detailed evolution- 
ary associations can be applied to population sized studies. At present, 
whole genome sequencing applied to large numbers of isolates is not 
feasible and microsatellite analysis is often difficult to reproduce 
precisely across laboratories, unlike MLST which has proven 
reproducibility both within and between laboratories [59] . However, 
microsatellites could be more convenient for population genetics 
studies at a microevolutionary level, due to their high resolution 
power. A further consideration in the analysis of diploid sequences is 
differentiating heterozygosity from copy number diversity. Ideally, we 
should prefer single copy genes for MLST schemes in order to avoid 
comparisons among paralogous. We performed in silico analyses in 
order to estimate the copy number of the selected targets on the 
genomic data of CL-Brener (TcVI) and Sylvio XIO (Tcl) (http:// 
tritrypdb.org/tritrypdb/). For these analyses, we used as query the 
primer sequences as well as the complete fragment sequences. These 
searches displayed just single matches in all cases. Consequently, we 
propose that all the examined MLST fragments may be considered as 
single copy genes, at least for typing and clustering. 

One of the most important aspects in any MLST scheme is to 
provide targets that consistently produce PGR amplicons requiring 
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minimal cleanup and are suitable for sequencing. Although in the 
current protocol, we recommend purifying PGR products with a 
suitable commercial kit (Quiagen), in most cases, this was not 
necessary and sequencing was performed directly from the PGR 
product. The exception was TcGPXII, and very occasionally 
SODA produced nonspecific products, neither of which are 
included in final recommended panels. Although the two 
previously published MLST [50,51] schemes showed promise in 
identifying diversity, some of the gene targets were not amenable 
for routine use. For example, LYTl was discarded due to 
unreliable amplification and DHFR-TS due to the need for 
internal primers. Therefore further optimisation performed here 
was necessary for practical use-. An important criterion for 
choosing targets was identifying those that used the same primers 
for both PGR amplification and sequencing to maintain simplicity 
and reduce costs. 

Taken together, we propose a MLST scheme validated against 
a panel representing all of the known lineages of T. cruzi. We 
propose that a 7 loci MLST scheme could provide the basis for 
robust DTU assignment and strain diversity studies of new isolates 
and a reduced 4 loci scheme for lineage assignment. Importantly, 
the sequence data generated can be utilised for a wide range of 
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downstream analyses, including the resolution of haplotypes for 
recombination analysis, population genetics analyses, and other 
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Finally, we propose that the seven-fragment MLST scheme 
could be used as a gold standard for T. cruzi typing, against which 
other typing approaches, particularly single locus approaches or 
systematic PGR assays based on amphcon size, could be 
compared. 
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